End-to-End Evaluation of Machine Interpretation Systems: A Graphical Evaluation Tool

نویسندگان

Susanne Jekat

Lorenzo Tessiore

چکیده

VERBMOBIL as a long-term project of the Federal Ministry of Education, Science, Research and Technology aims at developing a mobile translation system for spontaneous speech. The source-language input consists of human speech (English, German or Japanese), the translation (bidirectional English-German and Japanese-German) and target-language output is effected by the VERBMOBIL system. As to the innovative character of the project new methods for end-to-end evaluation had to be developed by a subproject which has been established especially for this purpose. In this paper we present criteria for the evaluation of speech-tospeech translation systems and a tool for judging the translation quality which is called Graphical Evaluation Tool (GET). 1 This work was funded by the Federal Ministry of Education, Science, Research and Technology (BMBF) in the framework of the VERBMOBIL Project under Grant 01 IV 101 A/O and in the framework of the SFB 538 Mehrsprachigkeit (Collaborative Research Center No. 538 Multilingualism) by the Deutsche Forschungsgemeinschaft (DFG). The responsibility for the contents of this study lies with the authors. 2 To simplify the presentation of this paper, we only refer to the language pair German-English. Introduction The performance of evaluation very often is driven by the characteristics of the system that has to be judged (Andenfilger, 1994). As to the Verbmobil project the evaluation should meet three aspects: • the needs of the developers, • the needs of the user, • the constraints on the evaluation of translation quality in general. In our concept and performance of evaluation we tried to combine these three aspects but one should keep in mind that at least the constraints on translation quality in general were meant to describe human translation with all its varieties and specific stylistic features. As to the special case of machine interpretation still only texts from limited domains can be transferred. So in our view it seems to be legitimate to simplify some of the procedures that are applied to the evaluation of human translation. An evaluation method based on any well known standard (EAGLES, 1995; Spark Jones and Galliers, 1996; Manzi, 1996) could not have integrated the three cited aspects, as traditional evaluation methods are intended for comparative evaluations more than for the investigation of a system during its development; therefore, to meet the requirements we had, we developed an integrated methodology and a tool for speech-to-speech quality evaluation which also allows easy access to the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Evaluation of Parallel Programs on the Data Diffusion Machine

A tool set for the monitoringand performance evaluation of parallel programs has been developed for the Data Diffusion Machine—a virtual shared memory architecture. The tool set has a layered structure, allowing the user to observe the machine at various levels of detail. The tools are built on top of a software emulation of the DDM. This emulator provides realistic timings because certain part...

متن کامل

A Graphical Pronoun Analysis Tool for the PROTEST Pronoun Evaluation Test Suite

We present a graphical pronoun analysis tool and a set of guidelines for manual evaluation to be used with the PROTEST pronoun test suite for machine translation (MT). The tool provides a means for researchers to evaluate the performance of their MT systems and browse individual pronoun translations. MT systems may be evaluated automatically by comparing the translation of the test suite pronou...

متن کامل

Constructing a clinical curriculum evaluation tool based on community orientation strategy (A guide for application)

Introduction: SPICES is an approach to assist curriculum planners while planning, reviewing or revising a curriculum. However, in the literature, there are few published papers describing the application of SPICES criteria for curriculum evaluation. The goal of this study is development of curriculum evaluation tool based on community-based strategy in SPICES model. Method: This developmental ...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Evaluation of Machine Translation and its Evaluation

Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigrambased F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

End-to-End Evaluation of Machine Interpretation Systems: A Graphical Evaluation Tool

نویسندگان

چکیده

منابع مشابه

Performance Evaluation of Parallel Programs on the Data Diffusion Machine

A Graphical Pronoun Analysis Tool for the PROTEST Pronoun Evaluation Test Suite

Constructing a clinical curriculum evaluation tool based on community orientation strategy (A guide for application)

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Evaluation of Machine Translation and its Evaluation

عنوان ژورنال:

اشتراک گذاری